{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![New Release: Accelerate YOLOv8](assets/yolov8.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Accelerate Ultralytics YOLOv8 with Speedster"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6cfcd562",
   "metadata": {
    "id": "6cfcd562"
   },
   "source": [
    "Hi and welcome 👋\n",
    "\n",
    "In this notebook we will discover how in just a few steps you can speed up the response time of deep learning model inference using the Speedster module from the open-source library nebullvm.\n",
    "\n",
    "With Speedster's latest API, you can speed up models up to 10 times without any loss of accuracy (option A), or accelerate them up to 20-30 times by setting a self-defined amount of accuracy/precision that you are willing to trade off to get even lower response time (option B). To accelerate your model, Speedster takes advantage of various optimization techniques such as deep learning compilers (in both option A and option B), quantization, half accuracy, and so on (option B).\n",
    "\n",
    "Let's jump to the code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%env CUDA_VISIBLE_DEVICES=0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Install Speedster"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install speedster"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!python -m nebullvm.installers.auto_installer --frameworks torch --compilers all"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Install Ultralytics YOLOv8"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install ultralytics"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load YOLOv8s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "from ultralytics import YOLO\n",
    "\n",
    "yolo = YOLO('yolov8s.pt')"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's load a test dummy data and see the original output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test_data = torch.randn(1, 3, 640, 640)\n",
    "yolo.model(test_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The original YOLOv8 model return as output a tuple where the first element is a tensor and the second is a list of tensors. Speedster currently supports only models that return only tensors, so we need to create a wrapper to overcome this issue:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "class YOLOWrapper(torch.nn.Module):\n",
    "    def __init__(self, yolo_model):\n",
    "        super().__init__()\n",
    "        self.model = yolo_model.model\n",
    "    \n",
    "    def forward(self, x, *args, **kwargs):\n",
    "        res = self.model(x)\n",
    "        return res[0], *tuple(res[1])\n",
    "        \n",
    "model_wrapper = YOLOWrapper(yolo)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## YOLOv8s Optimization with GPU"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now optimize the model using speedster:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from speedster import optimize_model\n",
    "\n",
    "# Provide some input data for the model    \n",
    "input_data = [((torch.randn(1, 3, 640, 640), ), torch.tensor([0])) for i in range(100)]\n",
    "\n",
    "# Run Speedster optimization\n",
    "optimized_model = optimize_model(\n",
    "  model_wrapper, input_data=input_data, metric_drop_ths=0.1, store_latencies=True\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can finally restore the original output format by wrapping the optimized model in a new class:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class OptimizedYOLO(torch.nn.Module):\n",
    "    def __init__(self, optimized_model):\n",
    "        super().__init__()\n",
    "        self.model = optimized_model\n",
    "    \n",
    "    def forward(self, x, *args, **kwargs):\n",
    "        res = self.model(x)\n",
    "        return res[0], list(res[1:])\n",
    "    \n",
    "optimized_wrapper = OptimizedYOLO(optimized_model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "optimized_wrapper(test_data.cuda())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## YOLOv8s Optimization with CPU"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from speedster import optimize_model, save_model, load_model\n",
    "from ultralytics import YOLO\n",
    "\n",
    "yolo = YOLO('yolov8s.pt')\n",
    "model_wrapper = YOLOWrapper(yolo)\n",
    "\n",
    "# Provide some input data for the model    \n",
    "input_data = [((torch.randn(1, 3, 640, 640), ), torch.tensor([0])) for i in range(100)]\n",
    "\n",
    "# Run Speedster optimization\n",
    "optimized_model = optimize_model(\n",
    "  model_wrapper, input_data=input_data, metric_drop_ths=0.1, store_latencies=True, device=\"cpu\"\n",
    ")\n",
    "\n",
    "optimized_wrapper = OptimizedYOLO(optimized_model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "optimized_wrapper(test_data)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "b72bdf54",
   "metadata": {},
   "source": [
    "## Save and reload the optimized model"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "ada71f91",
   "metadata": {},
   "source": [
    "We can easily save to disk the optimized model with the following line:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "99b3a9d0",
   "metadata": {},
   "outputs": [],
   "source": [
    "save_model(optimized_model, \"model_save_path\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "6308ddd7",
   "metadata": {},
   "source": [
    "We can then load again the model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f9946f6b",
   "metadata": {},
   "outputs": [],
   "source": [
    "optimized_model = load_model(\"model_save_path\")\n",
    "optimized_wrapper = OptimizedYOLO(optimized_model)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d50807de",
   "metadata": {
    "id": "d50807de"
   },
   "source": [
    "What an amazing result, right?!? Stay tuned for more cool content from the Nebuly team :) "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "b77ff2ac",
   "metadata": {
    "id": "b77ff2ac"
   },
   "source": [
    "<center> \n",
    "    <a href=\"https://discord.com/invite/RbeQMu886J\" target=\"_blank\" style=\"text-decoration: none;\"> Join the community </a> |\n",
    "    <a href=\"https://nebuly.gitbook.io/nebuly/welcome/questions-and-contributions\" target=\"_blank\" style=\"text-decoration: none;\"> Contribute to the library </a>\n",
    "</center>\n",
    "\n",
    "<center> \n",
    "    <a href=\"https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/speedster#key-concepts\" target=\"_blank\" style=\"text-decoration: none;\"> How speedster works </a> •\n",
    "    <a href=\"https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/speedster#documentation\" target=\"_blank\" style=\"text-decoration: none;\"> Documentation </a> •\n",
    "    <a href=\"https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/speedster#quick-start\" target=\"_blank\" style=\"text-decoration: none;\"> Quick start </a> \n",
    "</center>"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6 (main, Aug 30 2022, 04:58:14) [Clang 13.1.6 (clang-1316.0.21.2.5)]"
  },
  "vscode": {
   "interpreter": {
    "hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
